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Abstract 

This  technical  report1  presents  a  framework  for  visual  servoing  that  guarantees  convergence 
to  a  visible  goal  from  most  initially  visible  configurations  while  maintaining  full  view  of  all  the 
feature  points  along  the  way.  The  method  applies  to  first  and  second  order  fully  actuated  plant 
models.  The  solution  entails  three  components:  a  model  for  the  “occlusion-free”  workspace; 
a  change  of  coordinates  from  image  to  model  coordinates;  and  a  navigation  function  for  the 
model  space.  We  present  three  example  applications  of  the  framework,  along  with  experimental 
validation  of  its  practical  efficacy. 


1A  version  of  this  manuscript  will  appear  in  IEEE  Transactions  on  Robotics  and  Automation.  This  version  contains 
many  details  which  were  omitted  from  the  final  paper. 
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Figure  1:  Simulation  results  for  three  different  artificial  potential  function  based  visual  servo  con¬ 
trollers:  Top:  based  upon  the  navigation  function  proposed  in  equation  (34);  Middle:  based  upon 
the  potential  function  in  equation  (3)  resulting  in  failure  due  to  self  occlusion;  Bottom:  based  upon 
the  potential  function  in  equation  (4)  resulting  in  failure  due  to  a  spurious  local  minimum. 

1  Introduction 

Increasingly,  engineers  employ  computer  vision  systems  as  sensors  that  measure  the  projection  of 
features  from  a  rigid  body  at  as  it  moves  in  some  scene.  Such  an  approach  presupposes  the  availability 
of  a  reliable  machine  vision  system  that  supplies  a  controller  with  the  image  plane  coordinates 
of  features  of  the  body.  The  machine  vision  system  must  incorporate  image  processing,  feature 
extraction  and  correspondence  algorithms  suitable  to  the  scene  in  view.2  For  vision-based  control, 
these  features  are  then  used  to  close  a  servo  loop  around  some  desired  visual  image.  Traditionally, 
visual  servoing  algorithms  impose  motion  upon  the  actuated  configuration  space  variables,  q  £  Q, 
so  as  to  align  the  image-plane  features,  y  €  y,  with  a  previously  stored  goal  image,  y* .  When 
the  mapping,  c  :  Q  — >  y,  from  configuration  variables  to  the  image-plane  is  one-to-one  in  some 
vicinity  of  the  goal  then  traditional  visual  servoing  generally  results  in  a  closed  loop  system  with  an 
asymptotically  stable  equilibrium  at  the  unique  pre-image  q*  =  c^1(y*).  Many  algorithms  of  this 
nature  have  been  proposed  and  implemented  with  the  result  of  large  basins  of  attraction  around  the 
equilibrium  in  the  presence  of  large  errors  in  sensor  and  robot  calibration.  Hutchinson  et.  al.  [15] 
provide  a  general  introduction  and  extensive  bibliography  to  this  approach. 

Simple  visual  position  regulation,  for  which  a  suitable  set  of  visible  features  contrast  well  their 
surrondings,  leaves  few  challenges:  high  performance,  dynamic  (albeit  local)  vision-based  controllers 
which  fit  neatly  into  the  framework  of  linear  control  have  existed  now  for  some  years  [4] .  This  simple 
and  complete  characterization  is  owed  entirely  to  the  local  nature  of  regulation  problems.  However, 
the  creation  of  a  richer  set  of  behaviors  whose  global  (and  hence  highly  nonlinear)  properties  are 
well  characterized  remains  a  significant  challenge  to  visual  servoing.  The  perspective  projection  of 
features  to  a  CCD  camera  does  substantial  violence  to  the  scene  being  viewed.  In  addition  to  the  loss 
of  depth  information  of  point  features,  the  transient  loss  of  features  (due,  for  example,  to  occlusions) 
can  cripple  any  control  system  which  does  not  explicitly  address  the  possibility.  Moreover,  there  are 
subtle  geometric  facts  that  constrain  our  designs  (whether  or  not  we  take  them  into  account!).  For 

“Of  course,  as  is  clear  from  the  extensive  literature  (e.g. [  1 0.  12]),  implementing  these  feature  extracting  “virtual 
sensors”  represents  a  huge  challenge  and  is  beyond  the  scope  of  the  present  paper. 


example,  given  the  (monocular)  image  plane  projection  of  a  set  of  three  rigidly  connected  points,  it  is 
well  known  that  there  exists  up  to  four  poses  of  a  camera  which  will  result  in  the  same  projection  [35]. 
On  the  other  hand,  simply  adding  a  fourth  point  over  constrains  the  motion  so  that  the  individual 
feature  point  trajectories  may  not  be  assigned  arbitrarily  as  they  are  algebraically  constrained. 

The  classical  approach  to  visual  servoing  attempts  to  impose  straight-line  trajectories  on  image 
feature  point  locations.  For  example,  suppose  the  camera  projects  four  feature  point  on  a  rigid  body 
controlled  by  a  6  DOF  robot.  The  configuration  space  of  the  robot  -  for  example,  its  6  joint  angles 
-  is  given  by  Q  C  R6,  and  image  space  is  the  four  image  plane  pairs,  namely  y  C  R8.  Locally,  then, 
the  camera  map  is  then  just  a  map  from  R6  to  R8,  namely 


V  i 


V  = 


2/4 


=  c(q),  where  yi  €  R2,  *  =  1, . . . ,  4 


(1) 


are  the  image  plane  feature  locations  of  the  four  features  being  observed.  The  traditional  (kinematic) 
visual  servoing  law  is  then 

q  = -J\q)(y -y*),  (2) 

where  J'  =  (JT  J)”1  JT  is  the  pseudo  inverse  of  the  camera  Jacobian  matrix,  J(q)  =  Dc(q). 

Traditional  visual  servoing  systems,  such  as  those  based  on  the  above  approach,  have  numerous 
limitations.  First,  they  result  in  a  local  basin  of  attraction  whose  extent  is  poorly  or  not  at  all 
characterized.  For  example,  the  incursion  of  spurious  (attracting)  critical  points  may  arise  when 
y  —  y*  aligns  with  the  the  null  space  of  jf  in  (2).  Consequently,  the  local  basin  of  attraction  around 
q*  may  exclude  seemingly  reasonable  initial  conditions  [3] .  The  second  failing  of  the  visual  servoing 
algorithms  proposed  to  date,  involves  their  vulnerability  to  transient  loss  of  features  —  either  through 
self-occlusions  or  departure  from  the  field  of  view  (FOV).  To  the  best  of  our  knowledge,  no  prior 
work  guarantees  that  these  obstacles  will  be  avoided.  Usually,  the  problem  of  visibility  obstacles  is 
ignored  in  analysis,  and  when  encountered  in  practice  necessitates  human  intervention.  However,  as 
we  will  show,  both  of  these  limitations  -  local  convergence  and  transient  loss  of  features  -  can  be 
overcome  quite  readily. 

Another  major  problem  is  that  most  visual  servoing  algorithms  do  not  specifically  address  dy¬ 
namics.  Of  course,  given  the  multitude  of  successful  inverse  dynamics  based  control  strategies, 
trajectories  generated  from  (2)  (or  any  other  kinematic  controller)  could  be  tracked  very  precisely 
with  a  high  performance  robot  control  system.  However,  such  control  techniques  require  precise 
parametric  knowledge  of  the  robot’s  kinematics  and  dynamics,  the  extra  complexity  of  which  seems 
superfluous  given  the  simple  end-point  convergence  objective  of  most  visual  servoing  algorithms.  In 
other  words,  the  specific  reference  trajectory  generated  by  kinematic  controllers  is  merely  a  means 
to  an  end,  and  tracking  that  trajectory  exactly  may  not  be  necessary.  By  contrast,  our  approach 
generates  controllers  capable  of  extremely  high  performance,  which  exhibit  global  convergence  to 
the  end-point  goal,  without  the  added  complexity  involved  in  precisely  tracking  a  (clearly  some¬ 
what  arbitrary)  reference  trajectory.  Even  though  our  algorithms  do  not  prescribe  specific  reference 
trajectories,  the  methodology  nevertheless  affords  certain  guarantees  on  the  trajectories  that  result. 
For  example,  features  remain  visible  throughout  the  trajectory  (even  in  the  presence  of  Newtonian 
dynamics) . 

methodology  also  reduces  to  first-order  settings,  in  which  case  the  more  traditional  approach  in 
the  robotics  literature  of  tracking  reference  trajectories  applies. 
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1.1  Image-based  navigation 


In  a  naive  attempt  to  improve  (2),  note  that  it  may  be  conceived  as  a  gradient  law  using  the  potential 
function 

v{y)  =  \YJ\\yi-ytf  (3) 

Z  i=  1 


and  letting  q  =  —  ( Vq(ip  o  c).  Suppose  the  four  features  are  coplanar,  e.g.  they  are  all  on 
the  same  face  of  a  polyhedral  body.  A  self-occlusion  occurs  when  the  plane  containing  the  features 
intersects  the  camera  pinhole  -  namely  the  4  points  project  onto  the  same  line  on  the  image  plane. 
To  avoid  this  scenario,  consider  a  naive  improvement  of  (3)  that  avoids  self-occlusion  by  “blowing 
them  up” ,  namely, 


<p(y)  ■= 


£i=i  II Vi  ~  V* 


n 


{*d,fc}e  r 


det 


Vj 

1 


Vk 

1 


1/2  ’ 


(4) 


where  T  =  {{1, 2,  3},  {1,  2, 4},  {1, 3, 4},  {2, 3, 4}}.  Because  the  features  are  coplanar,  the  denominator 
will  go  to  zero  as  the  features  become  collinear,  and  thus  the  gradient  will  point  away  from  the  self¬ 
occlusion  obstacle.  However,  as  can  be  seen  from  Figure  1,  even  though  self-occlusions  are  avoided, 
convergence  is  not  necessarily  achieved.  Other  similarly  naive  approaches  suffer  the  same  peril. 

Although  the  naive  potential  function  approach  above  neither  adequately  addresses  occlusions 
nor  guarantees  convergence,  one  suspects  that  some  appropriately  designed  potential  might  overcome 
these  serious  limitations  of  traditional  visual  servoing.  Indeed,  we  will  show  in  this  paper  that  the 
obstacles  presented  by  self  occlusion  and  a  finite  FOV  can  be  obviated  by  addressing  in  a  methodical 
fashion  the  relationships  between  the  domain  and  range  of  c  from  which  they  arise. 

Specifically,  we  introduce  a  framework  for  visual  servoing,  depicted  in  Figure  2,  yielding  feedback 
controllers  which  are 


1.  dynamic:  applicable  to  second  order  (Lagrangian)  as  well  as  first  order  (kinematic)  actuation 
models; 

2.  global:  guaranteeing  a  basin  of  attraction  encompassing  most  initial  configurations  that  main¬ 
tain  feature  visibility; 

3.  visibility- obstacle  free:  avoiding  configurations  that  lose  features  due  to  either  self  occlusion  or 
departure  from  the  camera  FOV. 


1.2  Related  work 

A  few  researchers  have  incorporated  Lagrangian  dynamics  into  the  visual  servo  loop.  Zhang  and 
Ostrowski  [36]  developed  a  controller  for  an  Unpiloted  Aerial  Vehicle  (UAV)  equipped  with  a  camera 
for  which  they  exhibit  a  diffeomorphism  between  the  centroid  and  radius  of  a  circle  in  the  image 
plane  and  the  position  and  orientation  of  the  UAV  relative  to  a  sphere  in  the  workspace.  Formulating 
the  dynamics  in  generalized  image-plane  coordinates  leads  to  a  feedback  linearized  controller  that 
accounts  for  the  mechanical  system  dynamics.  This  novel  contribution  does  not  address  the  notion 
of  “visibility”  which  we  find  central  to  visual  servoing. 

Many  implementations  in  the  literature  offer  strong  anecdotal  evidence  that  suggests  convergence 
for  visual  servoing  systems  is  robust  with  respect  to  large  parametric  uncertainty,  though  researchers 
rarely  establish  formally  large  basins  of  attraction  of  visual  servoing  systems  (parametric  uncertainty 
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Figure  2:  Our  approach  to  visual  servoing  requires  a  camera  map,  c  :  Q  — >  y,  which  takes  each 
configuration  q  £  Q  and  produces  an  image  plane  measurement  y  =  c(q)  £  y ,  as  described  in  Section 
2.  The  image  plane  is  then  deformed  into  a  model  of  the  “safe”  unoccluded  scene  Z,  through  a  change 
of  coordinates  g ,  as  described  in  Section  4.  Finally,  a  navigation  function,  <p  :  Z  — >  R,  which  encodes 
the  goal  as  the  unique  global  minimum,  is  used  to  construct  a  convergent  generalized  nonlinear  PD 
controller,  as  reviewed  in  Section  3. 


aside).  Malis  et.  al.  [22]  and  Taylor  and  Ostrowski  [32]  introduce  visual  servo  controllers  incorpo¬ 
rating  partial  pose  reconstruction  to  guarantee  convergence  even  in  the  presence  of  large  parametric 
uncertainty.  However,  these  contributions  do  not  address  second  order  dynamics  nor  guarantee 
that  transients  will  avoid  visibility  obstacles.  Rizzi  et.  al.  designed  a  globally  convergent  nonlin¬ 
ear  dynamical  observer  for  a  falling  point  mass  viewed  by  a  binocular3  camera  pair  [29].  Their 
observer  exploits  the  property  that  a  pinhole  camera  projects  lines  into  lines  to  map  the  observer 
feedback  into  the  image  plane.  This  “pinhole  property”  can  also  be  applied  to  binocular  quasi-static 
visual  servoing  [16]  to  yield  a  controller  with  a  large  basin  of  attraction.  Hashimoto,  et.  al.  [13] 
use  potential  functions  to  increase  the  domain  of  attraction  for  their  (quasi-static)  visual  servoing 
system. 

In  addition  to  our  preliminary  results  [6,  7,  8],  there  have  been  some  recent  efforts  to  address 
the  FOV  problem,  albeit  in  a  quasi-static  setting.  Malis  et.  al.  [22]  guarantee  that  a  single  base 
point  remains  within  the  FOV  while,  as  noted  above,  guaranteeing  convergence  for  a  large  basin 
of  attraction.  Corke  and  Hutchinson  [5]  make  no  formal  guarantees,  but  have  designed  a  clever 
“partitioned”  visual  servo  strategy  for  which  simulations  suggest  a  large  basin  of  attraction  while 
maintaining  all  features  within  the  FOV  boundary.  Both  of  these  contributions  are  in  a  quasistatic 
setting. 

''This  paper  makes  the  distinction  between  stereo,  wherein  a  camera  pair  has  nearly  the  same  perspective  of  a 
scene  (as  is  the  case  for  biological  systems  with  stereo  vision),  and  binocular  vision  in  general. 
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1.3  Visual  sensing  of  rigid  motion 

If  the  features  are  such  that  the  camera  map  c  is  injective,  each  measurement  y  “pins  clown”  a 
unique  location  q.  Hence  a  positioning  task  may  be  accomplished  with  image  measurements,  even 
in  the  presence  of  parametric  uncertainty  in  the  robot  and  sensor.  Injectivity  of  c  implies  that  the 
dimension  of  the  output  space  is  greater  than  or  equal  to  the  dimension  of  the  configuration  space, 
but,  of  course,  the  converse  is  not  true.  For  example,  the  perspective  projection  of  three  feature 
points  of  a  rigid  body  is  a  6-dimensional  measurement,  from  the  three  image  plane  {x,  y)  pairs,  but 
each  image  6-tuple  generally  corresponds  to  multiple  algebraic  solutions  for  the  position  of  the  rigid 
body  (for  a  nice  geometric  interpretation,  see  Wolfe  et.  al.  [35]).  However,  the  perspective  projection 
of  at  least  four  rigidly  constrained  points  in  general  position  on  a  body  uniquely  determines  its  pose 
(see  for  example  [27])  and  hence  there  exists  a  unique  pose  which  registers  with  each  such  projection. 

“Natural”  coordinates  for  visual  servoing  have  not  yet  emerged  in  any  generality.  In  the  example 
applications  to  follow,  we  identify  in  each  case  the  underlying  model  space,  construct  a  solution  for 
the  model,  and  furnish  a  change  of  coordinates  that  pulls  the  model  solution  back  into  terms  of 
the  physically  measured  features.  However,  we  are  not  yet  able  to  characterize  in  any  physically 
convenient  manner  the  general  properties  of  the  features  that  would  make  such  constructions  possi¬ 
ble.  The  UAV  work  described  above  [36]  represents  a  different  attempt  to  develop  natural  features 
for  visual  servoing.  Another  seemingly  promising  alternative  relies  on  projective  kinematics  [31], 
although  it  has  not  been  validated  empirically  and  currently  lacks  the  machinery  required  to  lift  it 
to  the  second  order  setting. 

1.4  Organization 

The  central  contribution  of  the  paper  is  found  in  Section  4  where  we  propose  a  novel  framework 
for  dynamic,  occlusion-free  global  visual  servoing.  We  show  how  to  apply  this  framework  using 
three  illustrative  examples  that  provide  insight  into  the  specific  geometries  of  some  prototypical 
visual  servoing  systems.  In  Section  5  we  present  our  empirical  results  for  two  of  the  example  setups. 
Finally  we  provide  some  concluding  remarks  in  Section  6. 

Before  we  can  proceed,  however,  we  must  introduce  our  sensor  in  Section  2,  and  then  review 
concepts  from  robot  control  in  Section  3  to  provide  a  theoretical  foundation  for  the  subsequent 
material. 


2  Sensor  Model 

To  sense  a  robot’s  configuration  with  a  camera,  we  seek  a  map,  c,  from  the  robot  configuration  space 
Q  to  an  appropriate  output  space  y,  which  depends  on  several  factors.  We  assume  that  a  camera 
may  be  modeled  as  map,  7 r,  from  three  space  to  the  image  plane.  The  feature  space,  PS,  varies 
from  problem  to  problem,  but  often  we  just  choose  point  features  in  Euclidean  space,  E3,  in  which 
case  PS  =  E3  x  •  •  ■  x  E3  (TV  times).  The  features,  p  £  PS,  are  rigidly  attached  to  the  end  effector  of 
the  robot.  Their  position  in  a  body  fixed  frame,  F&,  is  known  and  hence  their  location  in  a  camera 
fixed  frame,  Fc,  is  given  by  the  kinematics,  h  :  Q  x  PS  — >  PS. 

Suppose  P  =  {pi}f=i  is  our  feature  set.  Then  cp,  =  h(q ,  bpi )  where  bpi  is  the  location  of  the 
feature  point  in  the  body  fixed  frame  (assuming  we  are  dealing  with  point  features)  and  ‘pi  is  the 
same  point  in  the  camera  fixed  coordinate  frame.  The  composition  of  the  camera  model  with  the 
kinematics  as  applied  to  all  of  the  features  generates  the  camera  map  cp  :  Q  — >  y 

cp(q)  :=  [noh(q,pi)  •••  tt  o  h(q,pN)\  (5) 
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parameterized  by  the  (constant)  feature  locations  in  body  coordinates.4 


2.1  Camera  models 

A  well  documented  family  of  models  mapping  elements  of  projective  space,  P3,  to  the  projective 
plane,  P2,  admit  of  a  matrix  representation  [23]:  orthographic  projection;  weak  perspective  projec¬ 
tion;  affine  projection;  and  full  perspective  projection  —  in  order  of  increasing  accuracy.  Because 
we  are  interested  in  potentially  large  displacements  over  the  camera  field  of  view  (FOV),  we  employ 
a  full  perspective  camera  model  in  the  present  work.  Of  course,  more  complex  models  do  exist  for 
camera  projection,  e.g.  Tsai  [33]  presents  a  technique  for  calibrating  a  camera  to  account  for  radial 
lens  distortion.  However,  the  physical  camera  systems  used  for  experiments  in  the  present  work  are 
well  modeled  without  accounting  for  additional  complexities. 

Assume  that  there  is  a  camera  fixed  coordinate  frame,  Fc,  and  that  points  are  expressed  with 
respect  to  that  frame  and  that  the  ( x ,  y)-plane  of  Fc  is  parallel  to  the  image  plane  and  coincident 
with  the  optical  center  or  “pinhole.”  Projective  points,  p  G  P",  are  represented  in  homogeneous 
coordinates  as  n  +  1  vectors  where  not  all  elements  are  zero,  and  Euclidean  points  x  G  E"  are 
represented  in  homogeneous  coordinates  as  n  +  1  vectors  [  xT ,  1  ]T.  Note  that  while  E”  is  a  subset 
of  P”  ([2],  Chapter  3),  the  use  of  the  homogeneous  matrix  representation  plays  a  very  different  role 
in  the  two  cases.  Nevertheless,  in  an  abuse  of  notation,  we  will  use  the  homogeneous  representation 
for  both  situations  when  the  specific  space  of  interest  is  clear  from  the  context. 

A  simple  pinhole  camera  with  unit  focal  length,  A  =  1,  admits  a  matrix  representation  relative 
to  the  homogeneous  matrix  representation  of  P"  as  n  :  P3  — >  P2,  where 

'1  0  0  0' 

n=  o  i  o  o  .  (6) 

0  0  10 

The  restriction  of  the  camera  map  to  E”  C  P™  does  not  admit  of  a  matrix  representation  when  using 
the  homogeneous  matrix  representation  for  points  in  E".  Rather,  we  must  divide  the  last  entry  to 
normalize  so  that  the  pinole  camera  map  from  E3  to  E2  is  given  by  p  i— >  y(np),  where 

^)  =  yf=[S  S  A  (V 

To  account  for  focal  length  A,  pixel  scale  ax,ay,  offset  sx,sy  and  a  skew  factor  on  the  image 
plane  9  (which  is  usually  7r/2),  we  introduce  the  projective  transformation  A  :  P2  — >  P2 


-ax  ax  cos  0  sx 
A  :=  0  —ay/sm9  sy  , 

0  0  1 


cxx  —  A  ax 
OLy  —  A  CXy 


(8) 


The  5  parameter  matrix  A  is  the  so-called  intrinsic  parameter  matrix.  A  complete  model  for  a 
perspective  projection  camera,  C  :  P3  — >  P2  is 

C:=ATV=[A  0]  (9) 

or,  in  Euclidean  coordinates,  ir  :  E3  — >  E2 


7 r(p)  =  y(AIIp).  (10) 

4To  simplify  the  analysis,  we  assume  that  the  measurements  are  temporally  and  spatially  continuous,  the  validity 
of  which  depends  heavily  on  the  problem.  For  the  implementations  discussed  in  Section  5  we  believe  they  are  accurate 
assumptions. 
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In  the  case  that  the  camera  frame  Fc,  is  not  coincident  with  a  world  reference  frame  F„, ,  in  which 
the  points  are  expressed,  then  we  have  C  :=  AHCHW,  where  the  6  parameter  extrinsic  parameter 
matrix  CHW  is  the  transformation  from  the  world  frame  coordinates  to  camera  frame  coordinates. 

The  camera  map  (5)  depends  on  several  factors:  the  type,  number  and  configuration  of  features 
selected;  the  robot  kinematics;  the  camera  model.  Section  4.1  considers  a  planar  problem  in  which 
the  configuration  space  is  SE(2),  and  the  camera  map  is  defined  to  be  the  perspective  projection  of 
three  feature  points  on  the  body  onto  a  ID  camera,  hence  FS  =  E2  x  E2  x  E2  and  y  =  E1  x  E1  x  E1. 
Section  4.2  presents  an  architecture  in  which  a  3DOF  robot  moves  in  a  subset  of  T3,  and  the  camera 
map  is  the  projection  of  the  position  and  orientation  of  a  feature  on  the  end  effector,  J~S  =  TiE3, 
so  that  y  C  T\E2  ss  E2  x  S1.  Finally,  in  Section  4.3,  we  present  a  6DOF  visual  servoing  algorithm 
which  uses  the  projection  of  a  set  of  coplanar  points.  In  this  example  we  compute  the  inverse  of  the 
forward  camera  map  and  work  in  a  reconstruction  of  the  3D  space  for  visual  servoing,  while  still 
guaranteeing  dynamical  convergence  and  occlusion  obstacle  avoidance. 

3  Robot  Control  via  Navigation  Functions 

Assume  we  have  a  holonomically  constrained,  fully  actuated  robot  with  known  kinematics,  affording 
a  suitable  set  of  local  coordinates,  i  =  1, . . . ,  n,  and  denote  the  n— dimensional  free  configuration 
space  as  Q.  The  system  dynamics 

M(q)q  +  C(q,q)q  +  G(q)  =  t  +  Fext(q,q), 

may  be  found  using  Lagrange’s  equations  (see,  for  example,  [1,  9,  11,  25])  where  Fext  are  external 
forces  (such  as  friction)  which  do  not  arise  from  Hamilton’s  variational  principle  and  r  are  the  input 
torques.  The  camera  plays  the  role  of  output  map,  c  :  Q  — >  y.  We  assume  exact  knowledge  of 
the  kinematics  and  link  masses5  (hence,  the  gravitational  term  G  is  exactly  known)  as  well  as  the 
external  forces  Fext.  Letting  the  input  torque  be  r  =  u  —  Fext  +  G(q),  where  u  is  our  control  input, 
the  plant  equations  are,  in  state-space  form, 


X\ 

X2 

. 

M{x\)~1  (u  -  C(x i,  £2)2:2) 

V  =  c(x  i),  (11) 

where  x  =  [qT ,  qT]T  ■ 

3.1  Task  specification 

The  state  space  is  constrained  by  the  presence  of  forbidden  configurations,  the  obstacle  set  O  C  Q. 
The  free  space  is  defined  as  the  obstacle- free  configuration  space  V  =  Q  —  O,  and  safe  configurations 
V  C  V  form  a  compact  connected  differentiable  manifold  with  boundary.  The  positioning  objective 

O 

is  described  in  terms  of  a  goal  set  Q  CD ■  The  task  is  to  drive  q  to  Q  asymptotically  subject  to  (11) 
by  an  appropriate  choice  of  u  while  avoiding  obstacles.  We  restrict  our  attention  to  point  attractors, 
Q  =  {q*}.  Moreover,  the  basin  of  attraction  £  C  TV  must  include  a  dense  subset  of  the  zero  velocity 
section  of  TV,  so  that  we  may  guarantee  convergence  from  the  entire  configuration  space.  Obstacle 
avoidance  requires  that  the  trajectories  avoid  crossing  the  boundary  set  B  =  dV,  i.e.  q(t)  £  V ,  for 
all  t  >  0. 

5Generally  these  data  are  supplied  by  a  manufacturer,  while  the  link  inertias  tensors  remain  unreported  and  are 
generally  difficult  to  measure  exactly.  Notice  that  our  “generalized  PD”  approach  to  control,  below,  will  not  require 
the  precise  form  of  the  mass-inertia  matrix  M  or  the  Coriolis  term  C  that  would  necessitate  knowledge  of  these 
parameters. 
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3.2  Navigation  functions 

The  task  of  moving  to  a  goal  while  avoiding  obstacles  along  the  way  can  be  achieved  via  a  nonlin¬ 
ear  generalization  of  proportional-derivative  (PD)  control  deriving  from  Lord  Kelvin’s  century  old 
observation  that  total  energy  always  decreases  in  damped  mechanical  systems  [19].  Formally,  this 
entails  the  introduction  of  a  gradient  vector  field  from  a  “navigation  function,”  a  refined  notion  of 
an  artificial  potential  function,  together  with  damping  to  flush  out  unwanted  kinetic  energy.6 

3.2.1  Gradient  Vector  Fields 

Let  T>  be  a  compact  n— dimensional  Riemannian  manifold  with  boundary,  with  metric  (• ,  •),  and 
p  :  T>  — >  R.  be  a  C2  functional.  For  Lagrangian  control  systems  of  the  kind  introduced  above,  the 
Riemannian  metric  is  specified  by  the  kinetic  energy  tensor.  When  the  mechanical  system  is  so 
heavily  damped  that  kinetic  energy  is  negligible,  or  is  explicitly  controlled  in  “velocity  mode”  then 
the  plant  equations  may  be  considered  first  order,  and  any  Riemannian  metric  may  be  used.  In  these 
situations,  for  example  as  in  the  case  of  our  RTX  robot  discussed  in  5.3,  the  convergence  properties 
of  the  gradient  field  alone  yield  the  desired  result. 

Choose  a  local  coordinate  chart  ip  :  U  C  R"  — >  V.  The  gradient  in  local  coordinates  is  given  by 
the  vector  of  partial  derivatives  weighted  by  the  inverse  of  the  metric.  Suppose  q  £  R"  are  the  local 
coordinates  induced  by  ip.  In  an  abuse  of  notation,  we  write  p  o  ip(q)  =  p(q),  and  hence,  in  local 
coordinates 

Vp(q)  =  M-\q)  DTqp(q) 

where  ( Dqp)i  =  and  M  is  the  local  representation  of  the  Riemannian  metric.  Gradient  descent 
can  now  be  achieved  in  local  coordinates  via 

q=-M~\q)DTqp{q)  (12) 

A  smooth  scalar  valued  function  whose  Hessian  matrix  is  non-singular  at  every  critical  point 
is  called  a  Morse  function  [14].  Artificial  potential  controllers  arising  from  Morse  functions  im¬ 
pose  global  steady  state  properties  that  are  particularly  easy  to  characterize,  as  summarized  in  the 
following  proposition. 

Proposition  1.  (Koditschek,  [19])  Let  p  be  a  twice  continuously  differentiable  Morse  function  on  a 
compact  Riemannian  manifold,  V.  Suppose  that  'S/p  is  transverse  and  directed  away  from  the  interior 
of  V  on  any  boundary  of  that  set.  Then  the  negative  gradient  flow  has  the  following  properties: 

1.  V  is  a  positive  invariant  set; 

2.  the  positive  limit  set  of  V  consists  of  the  critical  points  of  p 

3.  there  is  a  dense  open  set  DcD  whose  limit  set  consists  of  the  local  minima  of  p. 

For  the  quasi-static  mechanical  systems,  this  observation  is  sufficient  to  guide  controller  design. 
However,  these  first  order  dynamical  convergence  results  do  not  apply  when  a  Lagrangian  system  is 
subject  to  such  a  potential  field.  We  now  briefly  review  machinery  to  “lift”  the  gradient  vector  field 
controller  to  one  appropriate  for  second  order  plants  of  the  kind  introduced  in  (11). 

6 A  good  survey  of  potential  field  methods  for  robot  navigation  is  given  by  ([21],  Chapter  7).  The  refinement  to 
navigation  functions,  first  articulated  by  Koditschek  and  Rimon  [19,  20,  28],  is  only  very  briefly  sketched  here. 


3.2.2  Second  order,  damped  gradient  systems 

Introducing  a  linear  damping  term,  yields  a  simple  “PD”  style  feedback,  in  local  coordinates, 

u=-  Dqip(q)T  -  Kd  q,  (13) 

that  is  appropriate  for  second  order  plants.  Lord  Kelvin’s  observation  is  now  relevant  and  it  follows 
that  the  total  energy, 

77  =  <p  +  k  where  k  =  \ qTM(q)q ,  (14) 

is  non-increasing. 

Unfortunately,  if  the  total  initial  energy  is  higher  than  the  energy  at  some  point  on  the  boundary 
dT>,  trajectories  may  intersect  the  boundary.  Fortunately,  further  refining  the  class  of  potential 
functions  will  enable  us  to  construct  controllers  for  which  the  basin  of  attraction  contains  a  dense 
subset  of  the  zero  velocity  section  of  V.  The  following  definition  has  been  adapted  from  [19]. 

O 

Definition  1.  Let  V  be  a  smooth  compact  connected  manifold  with  boundary,  and  q*  £V  be  a  point 
in  its  interior.  A  Morse  function,  tp  £  C2[V,  [0,1]]  is  called  a  navigation  function  (NF)  if 

1.  ip  takes  its  unique  minimum  at  p(q*)  =  0; 

2.  ip  achieves  its  maximum  of  unity  uniformly  on  the  boundary,  i.e.  dV  =  <p~l(l). 

This  notion,  together  with  Lord  Kelvin’s  observation,  now  yield  the  desired  convergence  result 
for  the  Lagrangian  system  (11). 

Proposition  2.  (Koditschek  [19])  Given  the  system  described  by  (11)  subject  to  the  control  (13), 
almost  every  initial  condition  qo  within  the  set 

£  =  {(q,  q)  £  TV  :  rj(q,  q)  <  1}  (15) 

converges  to  q*  asymptotically.  Furthermore,  transients  remain  within  V  such  that  q(t)  £  V  for  all 
t  >  0. 

Proposition  2  generalizes  the  kinematic  global  convergence  of  Proposition  1.  Note  that  for  the 
second  order  system  £  imposes  a  “speed  limit”  as  well  as  a  positional  limit,  since  the  total  energy 
must  be  initially  bounded. 

3.2.3  Invariance  under  diffeomorphism 

One  last  key  ingredient  in  the  mix  of  geometry  and  dynamics  underlying  the  results  we  present 
revolves  around  the  realization  that  a  navigation  function  in  one  coordinate  system  is  a  navigation 
function  in  another  coordinate  system,  if  the  two  coordinate  systems  are  diffeomorphic  [19].  This 
affords  the  introduction  of  geometrically  simple  model  spaces  and  their  correspondingly  simple  model 
navigation  functions. 


4  Navigation  Function  Based  Visual  Servoing 

We  wish  to  create  visual  servoing  algorithms  that  are  high  performance,  global  and  yet  safe  with 
respect  to  occlusion  obstacles  O  that  arise  due  to  the  finite  FOV  and  self-occlusions.  To  achieve  our 
objective  we  compute  the  visible  set  for  a  particular  problem.  This  is  the  set  of  all  configurations 
V  :=  Q  —  O  in  which  all  features  are  visible  to  the  camera  and  on  which  c  is  well  defined.  We  then 
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design  a  safe,  possibly  conservative,  subset  V  C  V  in  which  the  body  is  permitted  to  move.  The  set 
V  provides  additional  safety  with  respect  to  obstacles  and  possibly  simplifies  the  topology.  We  then 
define  the  image  space  X  =  c{T> )  C  y.  The  camera  map  must  be  a  cliffeomorphism  c  :  V  «  X.  For 
each  problem,  T>  is  analyzed  to  construct  a  model  space  Z  and  a  diffeomorphism  g  :  X  ss  Z.  For 

O 

simplicity,  we  restrict  our  attention  to  point  attractors  Q  =  {<?*},  q*  GT>  from  which  we  define  the 
goal  image  y*  =  c(q*). 

We  propose  a  new  framework  for  visual  servoing  that  incorporates  three  ingredients: 

1.  a  model  space,  Z,  for  the  “safe”  configurations,  V\ 

2.  a  navigation  function  <p  :  Z  —+  [ 0, 1],  for  the  model  space; 

3.  a  diffeomorphism ,  g  :  X  — >  Z,  from  the  image  space  to  the  model  space. 

These  three  ingredients  are  assembled  with  the  feedback  control  strategy  (13),  which  guarantees 
that  all  initial  configurations  within  V  dynamically  converge  to  the  goal  while  ensuring  occlusion- 
free  transients  with  respect  to  the  obstacle  set  O  and  hence  overcome  all  of  the  traditional  limitations 
of  visual  servoing  systems  listed  in  Section  ??. 

In  2D  NF-based  visual  servoing,  the  function  g  does  not  contain  a  copy  of  the  inverse  of  the 
camera  map.  In  3D  NF-based  visual  servoing,  however,  g  does  contain  a  copy  of  the  inverse  camera 
map,  c-1.  In  that  sense,  2D  algorithms  are  “simpler.”  On  the  other  hand,  2D  algorithms  require 
the  Jacobian  of  the  camera  map  c,  because  the  navigation  function  in  the  configuration  variables  is 
given  by  ip(q)  =  p  o  g  o  c(q)  and  hence  the  gradient  is  given  by 

DqpT  =  Dct  DgT  Dp1'. 

By  recourse  to  the  general  framework  outlined  above  we  develop  controllers  for  several  specific 
configurations  of  a  robot  and  monocular  camera  in  the  subsections  that  follow.  In  Section  4.1  and 
Section  4.2  we  present  two  example  systems  which  incorporate  2D  visual  servoing,  and  in  Section  4.3 
we  present  a  3D  visual  servoing  algorithm  for  a  6DOF  body.  Interestingly,  the  very  different  visual 
servoing  problems  of  Section  4.2  and  Section  4.3  share  a  common  model  space  Z  =  [—1,1]”  x  S1 
for  n  =  2  and  n  =  5  respectively,  so  Appendix  A  presents  a  new  navigation  function  for  the  more 
general  model  space  Z  =  [—1,1]"  x  Tm  for  all  n  and  m  in  N. 

4.1  Example  1:  Planar  body,  planar  camera 

In  this  section,  we  discuss  the  problem  of  visually  servoing  a  planar  rigid  body,  viewed  by  a  planar 
pinhole  camera,  as  depicted  in  Figure  3,  first  presented  in  [6]. 

Suppose  we  have  a  planar  rigid  body,  Q  =  SE(2),  with  three  distinguishable  collinear  feature 
points.  In  this  case,  we  will  adopt  a  representation  via  homogeneous  matrices  parameterized  in  local 
coordinates  by  a  position  and  orientation  q  =  [Tx,Ty,9]T ,  i.e. 


R  T 

0T  !  _ 

cos  9 

—  sin  9 

T*  ' 

H  = 

=  ip(q)  ■= 

sin  9 

0 

cos  9 

0 

T  „ 

1 

and  interpret  H  =  cHb  £  SE(2)  as  the  rigid  change  of  coordinates  from  the  body  to  the  camera 
frame.  We  conveniently  co-locate  the  rr-axis  of  the  body  with  the  edge  containing  the  feature  points, 
P,  so  that  in  body  coordinates 

bPi  =  [h  0  l]T  for  i  =  1,2,3 
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Figure  3:  The  setup  for  a  planar  pinhole  camera  and  planar  rigid  body  with  collinear  feature  points 
showing  three  typical  configurations  of  a  rigid  body  with  respect  to  the  camera.  Left:  The  features 
face  the  camera,  but  the  leftmost  point  lies  out  of  view.  Center:  Although  within  the  camera 
workspace,  the  body  occludes  the  features.  Right:  The  features  are  all  visible. 


where  l\  <  l2  <  I3,  and  in  camera  coordinates 

cPi  =  H  bpi  =  [Tx  +  U  cos  9  T y  +  li  sin  9  l] T 

The  body  y-axis  is  oriented  “into”  the  body,  as  depicted  in  Figure  3. 

The  camera  is  assumed  to  have  a  finite  FOV,  represented  by  two  angles  a\,a2  €  ( — 7t/2, 7t/2) 
measured  with  respect  to  the  camera  optical  axis  (y-axis).  The  FOV  angles  define  a  FOV  cone  which 
is  the  workspace  of  the  camera,  Wc  C  E2 

Wc  =  {p  €  E2  :  p2  >  0,arctan(pi/p2)  €  (ai,a2)}  ■ 

For  simplicity,  consider  a  pinhole  camera  (10)  with  unit  focal  length  ir  :  Wc  — >  E1  is  simply 

Ap)  ■=  [&  i]1  (16) 

where  p  is  a  point  expressed  with  respect  to  the  camera  frame.  The  “edge”  of  the  image  plane 
corresponds  to  the  two  points  ymin  =  tanai  and  ymax  =  tana2- 

4.1.1  The  camera  map  and  visible  set  for  planar  servoing 

The  total  workspace  W  C  SE(2)  is 

W:={ff£  SE(2)  :  Hbpi  £  Wc,i  =  1 . .  .3}  . 

In  other  words,  to  be  in  the  workspace,  all  features  must  project  to  the  interval  7r(yVc) 

E1,  as  depicted  in  Figure  3. 

The  set  of  configurations  “facing”  the  camera  is  simply 

T  :=  {H  e  SE(2)  :  w{H)  >  0}  where 

v(if )  =  (Ty  cos  9  —  Tx  sin  9)  , 

so  the  visible  set  is  defined  V  :=  T  fl  W. 

We  now  define  the  camera  map,  c  :  V  — >  E1  x  E1  x  E1 

c(H)  :=  [n(Hbpi)  n(Hbp2)  n(Hbp3)]  . 


=  (ymin,ymax)  C 


(17) 
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In  an  abuse  of  notation,  we  will  often  write  c(q)  when  using  local  coordinates,  as  a  stand  in  for 
co  4>(q). 

The  points  y  =  [3/1 , 2/2 , 2/3]  =  c(H)  are  related  to  the  feature  positions  along  the  face  of  the  body 
by  a  “homography”  (a  member  of  the  group  of  projective  transformations  of  the  line,  PL(1)  [10]), 

i.e. 


aiiyi  =  Z[li  l]T,  i  =  1,2,3,  where  (18) 

[cos#  Tx 
[sin#  T.J  ’ 


4.1.2  Diffeomorphism  to  the  image  plane  for  planar  servoing 

We  identify  the  output  space  y  =  E1  x  E1  x  E1  with  E3  in  the  natural  way.  Define 

lp={y  £  E3  :  y1  -  ymln  >  p0,  2/2  -  2/1  >  pi, 

2/3  -  2/2  >  P2,  2/max  —  2/3  >  P3  } 

where  Pi  >  0,  i  =  0, . . . ,  n.  When  p  =  [po, . . . ,  p3]T  =  0,  we  write  2o,  and  define 

J  ■=  To  =  {y  e  E3  :  2/min  <  2/1  <  2/2  <  2/3  <  2/max}  • 

The  map  c  (17)  is  a  diffeomorphism  from  V  to  its  image  c(V)  =  J,  the  proof  of  which  follows 
from  the  following  three  facts:  (i)  c  is  onto,  i.e.  c(V)  =  J ,  (ii)  c  is  one-to-one  and  (iii)  c  is  a  local 
diffeomorphism  -  i.e.  it  is  smooth  and  smoothly  invertible  -  at  each  point  in  V. 

(i)  One  readily  verifies  that  for  each  H  £  V,  y  =  c(H )  £  J ,  i.e.  c(V)  C  J .  Furthermore,  for  each 
three  vector  y  £  J ,  there  is  a  unique  (up  to  scale)  nondegenerate  homography,  Z .  which  relates  the 
correspondences  via7 

aiyi  =  Z[li  l]T  ,  i  =  1,2,3 

in  homogeneous  coordinates.  Since  Z  is  full  rank,  the  columns  are  linearly  independent  and  hence 
Z  may  be  parameterized  as  in  (18)  for  H  £  V  (since  if  H  ^  V  then  y  (ji  J). 

(ii)  Suppose  that  c  is  not  one-to-one,  i.e.  y  =  c{H\)  =  c(H2),  Hi,H2  £  V,  then  the  matrices 


cos  #1 

•  n  J-  1 

and  Z 2 = 

cos  #2  „ 

•  n  J-  2 

sm  #1 

sm  #2 

represent  the  same  homography,  namely  Z\  =  01Z2  ■  Since  the  first  column  of  each  matrix  has  unit 
length,  then  Z\  =  ±^2,  but  if  both  Ti  and  T2  are  in  front  of  the  camera,  then  (T;)^  >  0,  and  hence 
H\  =  H2  which  is  a  contradiction. 

(iii)  The  pinhole  camera  7 r  is  differentiable  everywhere  in  front  of  the  camera,  and  hence  c  is 
differentiable  on  V.  Direct  computation  reveals  that 

....  (h  —  1 2)  (I2  —  h )  (h  ~  h)  (Ty  cos  8  —  Tx  sin  9) 

\Dc(q)\  =  - p p - 7T 

(Ty  +  h  sin 6)(Ty  +  l2  sin#)2(Ty  +  l3  sin#)2 

_  /  m  (k  -  h)  (k  ^  h)  (h  -  h) 

(Ty  +  lisin6)(Ty  +  Z2  sin^)2(Ty  + /3  sin^)2 

which  is  different  from  zero  at  every  point  in  V,  and  hence  c  is  a  local  diffeomorphism  at  every  point 
in  V  (inverse  function  theorem). 

7A  projective  transformation,  Z  S  PL(1)  is  uniquely  determined  by  the  correspondence  of  three  distinct  points 
[26]. 
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4.1.3  Model  space  for  planar  servoing 

We  seek  a  compact  manifold  with  boundary  on  which  to  impose  a  navigation  function.  Note  that  V  is 
an  open  set.  Moreover,  V  is  not  compact  since  there  is  no  bound  on  the  magnitude  of  translational 
component,  and  hence  it  is  impossible  that  V  ss  Tq.  However  it  is  practical  to  impose  impose  a 
“collar”  around  the  points  on  the  image  plane  by  enforcing  pi  >  0  and  hence  require  that  they 
maintain  a  minimum  distance  from  one  another,  as  well  as  maintaining  their  distance  from  the  edge 
of  the  image  plane.  Note  that  Ip  C  J  if  pi  >  0,  and  hence 

V  =  c~1(Xp)  CV  (19) 

is  compact  (Munkres  [24],  Theorem  5.5). 

So,  we  identify  a  general  model  space  of  the  form 

Z  =  {z  G  I"  :  zi+ 1  -  Zi>  pi,  i  =  0,...,n}  (20) 

where  Zq,  zn+-\  and  pi  >  0,  i  =  0, . . . ,  n  are  suitable  constants.  For  planar  visual  servoing,  n  =  3. 


4.1.4  Navigation  function  for  planar  visual  servoing 


We  require  a  change  of  coordinates  from  the  image  plane  to  a  model  space  Z  for  T>.  In  our  case,  we 
choose  Z  =  Xp,  and  hence  g  is  the  identity  mapping.  For  the  model  space  (20)  we  refine  a  class  of 
navigation  functions  designed  in  a  separate  context  by  Koditschek  [18]. 

Proposition  3.  (Adapted  directly  [18])  The  objective  function 


p(z)  = 


z  —  z 


* \\2k 


nr=o(*+i  -ziY-pt 


is  convex  on 


Z  =  {z  eK"  :  zi+ 1  -  Zi>  Pi,  i  =  0,...,n} 

where  Zq,  zn+ 1  and  pi,  i  =  0, . . . ,  n  are  suitable  constants  and  k  >  (2 n  +  3)/2  in  which  case 

A/k 


V  ■= 


(i  +  Tpy/k 

is  a  navigation  function  on  Z . 

For  planar  servoing,  n  =  3  and  so  we  require  k  >  9/2,  hence 


(21) 


v{y)  =  \\y - y*\\2k /([(yi  - ymm)2  - Po]  [(2/2  -yi)2  -pi] 

[(2/3  -  V2?  -  pI]  [{y—-y.)2-Pl\) 

is  convex.  Moreover,  tp  given  by  (21)  is  a  navigation  function  on  Xp  and  <p  :=  <p  o  c  is  a  navigation 
function  on  T>. 


4.2  Example  2:  Buehgler  arm,  spatial  camera 

This  example  serves  two  purposes.  From  a  practical  standpoint,  it  has  allowed  us  to  explore  visual 
servoing  with  a  high  performance  experimental  apparatus  called  the  Buehgler  arm  built  previously  in 
our  laboratory  [34,  29]  which  we  augmented  with  two  high-speed  digital  cameras.  The  experimental 
results  are  shown  in  Section  5.  From  a  theoretical  standpoint,  it  will  allow  us  to  explore  servoing  on 
a  different  space  Z  =  [—1,1]”  x  Tm  with  n  =  2  and  rn  =  1,  with  a  high  performance  platform  that 
demands  a  “tunable”  navigation  function.  Moreover,  it  will  turn  out  that  6DOF  visual  servoing 
(Section  4.3)  can  be  cast  in  the  same  model  space  (with  n  =  5  and  m  =  1). 
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Figure  4:  Left:  The  Buehgler  arm  has  been  modified  with  an  “arrow”  feature  on  the  tip  of  the  arm, 
which  is  observed  by  a  perspective  projection  camera.  The  camera  image  is  segmented  to  extract 
the  arrow,  as  depicted.  Right:  The  surface  swept  out  by  the  tip  of  the  arm  for  q\  £  (— 7t/2,  7t/2), 
qi  £  (— 27t/3,  -0.l7r). 


Figure  5:  Left:  An  image  of  the  Buehgler  arm  taken  from  the  servo  camera.  The  tip  of  the  arm 
has  been  marked  with  a  “pointer.”  Right  The  segmented  image  showing  the  extracted  feature. 


4.2.1  Buehgler  arm  kinematics 

The  Buehgler  arm,  depicted  in  Figure  4,  has  three  actuated  revolute  degrees  of  freedom  parameter¬ 
ized  in  local  coordinates  by  angles  q=[qi1  q2 ,  <73  ] T  and  its  configuration  space  is  Q  =  S1xS1xS1  = 
T3.  Denote  the  homogeneous  transformation  from  the  gripper  frame  to  the  world  base  frame,  given 
in  [30],  as  wHg(q).  The  transformation  from  the  world  frame  to  camera  frame  is  CHW,  and  hence 
H  =  cHb  =  cHwwHg.  We  let  R  =  [ri,r2,r3]  and  T  denote  the  rotation  and  translation,  respectively, 
effected  by  H. 

We  affix  a  “pointer”  to  the  tip  of  the  arm,  which  we  model  as  the  position  and  unit  orientation 
of  a  vector  in  three  space,  namely8  (br,  bv )  £  TjE3  ss  E3  x  S2  =  TS.  The  total  forward  kinematic 
map  is  then  defined  hg  :  Q  x  TS  — ►  TS 


hs{q)  := 


'H(q)br 

rs{q) 

R{q)bv 

ys(q)_ 

where  (rg.  vg)  are  in  the  camera  frame,  and  the  subscript  denotes  dependence  on  an  important 
kinematic  parameter,  the  “shoulder”  offset,  6. 

We  consider  a  feature  point  at  the  tip  of  the  arm  (which  has  length  g) ,  oriented  in  the  y-direction 
of  the  gripper  frame,  namely 


(V»  =  ([0,  0,  0,  1]T,[0,  1,  0  ]T)  £  TS 

C  TX  is  the  unit  tangent  bundle  of  X  [14]. 
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4.2.2  The  camera  map  and  visible  set  for  the  Buehgler 

A  perspective  projection  camera  (10),  located  at  the  frame  Fc,  is  positioned  to  view  the  robot  end 
effector  as  depicted  in  Figure  4.  The  FOV  angles,  (an,  ai2, 021, 022)  G  (— 7r/2,  7t/2),  determine  the 
workspace  of  the  camera,  Wc  C  E  ,  in  a  fashion  analogous  to  the  planar  camera,  namely 


Wc  :={  p  G  E3  :  p3  >  0,  arctan  (p3/p3)  G  (an,  a  12), 

arctan(p2/p3)  G  (a2i,a22)  }  • 

The  edge  of  the  image  plane  is  defined  in  terms  of  the  “lower  left  corner”  ymin  g  E2  and  the  “upper 
right  corner”  ymax  g  E2,  the  image  plane  projections  of  the  FOV  edges. 

We  define  the  total  workspace  as 

W  :=  {q  G  Q  :  r5(q)  g  Wc}  (22) 


The  set  of  configurations  facing  the  camera,  those  for  which  the  feature  is  not  occluded  by  the  arm, 
is  defined  very  similarly  as  for  planar  visual  servoing.  Let  bn  =  [0,  0,  —  1  ]T  denote  the  normal 
vector  to  the  tip  of  the  arm,  facing  “into”  the  arm.  Then 

F:={q£Q-.rs(q)TR(q)bn>  0}  (23) 


And  finally,  we  have  V  :=  T  ft  W. 

The  map  c,  introduced  formally  below,  can  be  intuitively  described  as  follows.  A  “pointer”  is 
attached  to  the  tip  of  a  arm,  and  a  camera  sees  its  position  and  orientation  on  the  image  plane. 
Roughly  speaking,  the  waist  ( q\ )  moves  the  feature  in  the  image  plane  x  direction,  the  shoulder  (q2) 
moves  the  feature  in  the  image  plane  y  direction  and  the  wrist  (q3)  rotates  the  feature  on  the  image 
plane.  The  camera  is  positioned  so  the  feature  may  move  freely  over  the  entire  image  plane  and  may 
be  rotated  to  any  orientation  on  the  image  plane  at  any  base  point. 

It  turns  out  [30]  that  r$  and  Rbn  depend  only  on  the  “base”  (first  two)  configuration  variables. 
Hence,  according  to  (22)  and  (23),  the  visible  set  decomposes  as  V  =  1Z  x  S1,  i.e.  the  third  configu¬ 
ration  variable,  q3,  is  unrestricted  at  each  base  point  (91,(72)  G  1Z  C  T2. 

The  camera  map  is  given  by  the  projection  of  the  feature,  and  its  orientation  on  the  image  plane, 
namely 


c{q)  := 


n  o  rs{q ) 

Z  {[Dtt  o  rs(q)\  vs{q)} 


(24) 


where  Z  normalizes  the  vector  on  the  image  plane.  Hence,  the  function  c  yields  the  position  and 
orientation  of  our  projected  feature  on  the  image  plane,  i.e.  c  :  V  — »  V,  where  y  =  XjE2  «  E2  x  S1. 
We  used  the  symbolic  algebra  package  Mathematica  to  generate  a  relatively  simple  closed-form 
expression  for  c,  and  its  Jacobian  Dqc(q). 

In  our  experiments,  we  position  the  camera  so  that  all  the  rays  within  the  workspace  of  the  camera 
intersect  the  set  A4g  =  r$(7Z)  C  E3  transversely  once.  This  requires  the  camera  be  sufficiently  close 
to  enable  the  robot  tip  to  position  the  pointer  anywhere  on  the  image  plane.  Morever,  we  assume 
the  earner  is  positioned  so  that  1Z  does  not  contain  the  kinematic  singularity  that  occurs  when  the 
second  axis  is  pointing  straight  up  [30].  As  a  direct  consequence  we  have  i)  rg  :  1Z  «  Aig  is  an 
analytic  diffeomorphism  and  ii)  the  restriction  of  n  defined  by  composition  with  the  kinematics, 
7r  \rs(K)  '■  Ada  — >  E2,  is  an  analytic  diffeomorphism  as  well,  7r  o  r$  :  1Z  «  E2. 
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4.2.3  Diffeomorphism  to  the  image  plane  for  the  Buehgler 

The  image  space  is  defined 

Xp  :=  {  (y,  6)  G  TiE2  :  yfn  <  Vi  ±  Pi  <  y?™,  *  =  1,2} 

where  pi,  i  =  1,2  are  positive  constants  which  impose  a  “safe”  border  around  the  image  plane. 
When  pi  =  P2  =  0,  we  write  Xq.  Define 

J  :=  {y  e  E2 :  yfn  <  y,  <  2/“ax,  *  =  1,2}. 

o 

and  note  that  T\J  —  To-  In  Appendix  B,  we  show  that  c  (24)  is  a  diffeomorphism  c  :  V  «  T\J  in 
the  case  that  6  is  sufficiently  small,  which  we  now  simply  assume  to  be  true  (an  assumption  validated 
by  the  experiments). 

4.2.4  Model  space  for  the  Buehgler 

As  in  the  planar  example  (see  Section  4.1.3),  we  seek  a  compact  manifold  with  boundary  T>  on  which 
to  impose  a  navigation  function.  As  before,  this  is  done  by  taking  the  inverse  image  under  c  of  a 
compact  subset  of  the  image  plane,  namely 

V  =  c~1(Xp)  C  V  (25) 

which  again  as  before  is  a  compact  manifold  with  boundary. 

4.2.5  Navigation  function  for  Buehgler  servoing 

We  require  a  change  of  coordinates  from  the  image  plane  to  a  model  space  Z  for  V.  By  letting  g  be 
a  simple  (affine)  scaling  and  translation  of  the  coordinates,  the  set  Xp  is  diffeomorphic  to  the  model 
space  Z  in  (35)  with  n  =  2  and  m  =  1.  Hence,  according  to  Appendix  A, 

p  =  ip  o  c  (26) 

where  (p  is  given  by  (37),  is  a  navigation  function  on  X). 

4.3  Example  3:  Spatial  body,  spatial  camera 

Consider  a  free  convex  polygonal  rigid  body,  with  configuration  space  Q  =  SE(3),  and  let  P  = 
[  Pi,  •  •  •  ,  Pn  ],  Pi  G  E3,  be  a  set  of  coplanar  distinguishable  points  on  a  face  of  the  body.  Because 
the  points  are  coplanar,  they  are  in  a  2-dimensional  subspace  of  E3.  Attach  the  body-fixed  frame 
Ff,  such  that  the  (x,  i/)-plane  contains  the  feature  points,  and  the  2:-axis  is  normal  and  oriented 
toward  the  interior  of  body.  Hence  the  z-component  of  the  points  in  body  coordinates  is  zero,  so  for 
convenience  let 

bPn 

bB  =  kbP=[bbl  ■■■  bb±]  ,  \=  bpi2  GE2 

1 

'1  0  0  0' 

A=  0  1  0  0 

0  0  0  1 
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denote  the  “reduced”  homogeneous  body  coordinates,  where  bpn  and  bpi2  are  the  x  and  y  components, 
respectively,  of  bP.  Note  that  bP  =  AT  bB.  Let 

pc  =  arg  min  max  ||  pt  -p\\,  p  =  max  ||  pt  -  pc  || , 

e2  *6r  *er 

r  =  {i, ...  ,tv} 

denote  the  center  and  radius  of  the  smallest  sphere  containing  all  the  points.9  Now,  assume  without 
loss  of  generality  that  F&  is  located  at  pc,  i.e.  bpc  =  [  0,  0,  0,  1  ]T.  Finally,  we  let  the  homogenous 
matrix  H  =  cHb  G  SE(2)  denote  the  rigid  change  of  coordinates  from  the  body  to  the  camera  frame. 
Once  again  R  =  [n,r2,r3]  and  T  denote  the  rotation  and  translation,  respectively,  effected  by  H. 

4.3.1  The  camera  map  and  visible  set  for  spatial  servoing 

The  visible  set  is  defined  analogously  for  the  spatial  case  as  it  was  in  the  planar  case  in  Section  4.1. 
The  visible  set  is  the  set  of  all  poses  in  SE(3)  such  that  the  feature  points  are  within  the  FOV,  and 
the  body  is  facing  the  camera.  The  set  of  configurations  facing  the  camera  is  defined 

T  :=  {H  G  SE(3)  :  v(i7)  >  0}  where  v(H):=r%T. 

The  FOV  angles  crn,  cti2,  ct2i,  <222  determine  the  workspace  of  the  camera  >V0  C  E3 

Wc  ■=  {p  G  E3  :  p3  >  0,  arctan  (pi/p3)  G  (an,fti2), 
arctan (p2/p3)  G  (021,022)}- 

depicted  in  Figure  6.  The  total  workspace  W  C  SE(3)  is  then 

W:=  {H  G  SE(3)  :  HbPl  G  Wc,  i  G  T} 

and  the  visible  set  is  defined  as  for  the  planar  case  V  :=  T  D  W. 

To  simplify  the  geometry,  we  restrict  the  visible  set  to  a  conservative  subset  V  C  V  with  respect 
to  the  FOV.  In  particular,  we  consider  all  translations  which  keep  the  point  pc  within  the  so-called 
“admissible  cone,”  Wp 

Wp  :=  {p  G  E3  :  Bp{p)  C  Wc} 

depicted  in  Figure  6.  In  other  words,  Wp  is  all  translations  that  keep  a  sphere  of  radius  p  completely 
within  the  FOV.  If  we  restrict  the  translation  of  our  rigid  body  to  be  within  this  set,  then  all 
the  feature  points  will  stay  in  the  FOV  regardless  of  the  body  orientation.  Hence,  the  restricted 
workspace 

W  :=  {H  G  SE(3)  :  T  G  Wp }  (27) 

is  combined  with  the  facing  set,  to  yield  the  conservative  visible  set  V'  :=  P  n  W'  C  SE(3) 

The  centroid  is  confined  to  move  in  an  open  solid  cylinder,  i.e.  R+  x  D2  where  D"  is  an  n- 
dimensional  open  disk.  Recall  that  the  body  coordinate  frame,  F&  is  attached  such  that  the  z-axis 
is  orthogonal  to  the  face  (facing  into  the  body)  and  the  (x,  y)-plane  contains  the  feature  points. 
Consider  the  fact  that  SO(3)  is  an  SO(2)  bundle  over  S2,  and  identify  the  orientation  of  the  z-axis 
with  the  base  point  in  S2.  The  requirement  that  the  body  faces  the  camera  is  a  constraint  on  the 
the  2-axis,  namely  that  it  always  has  a  positive  projection  of  onto  the  line-of-site.  This  yields  an 

9  Note  that  simply  computing  the  centroid  of  all  the  points  does  not  necessarily  give  the  center  of  the  smallest 
sphere  containing  all  the  points,  and  so  this  somewhat  more  awkward  definition  is  needed. 
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X 


Figure  6:  The  camera  workspace,  Wc  C  E3,  is  the  set  of  all  positions  that  are  within  the  field  of  view. 
It  is  parameterized  relative  the  camera  frame  by  the  angles  {cxi}f=1  of  the  four  FOV  bounding  planes. 
The  admissible  cone,  Wp  C  Wc ,  refers  to  all  positions  that  keep  a  sphere  of  radius  p  completely 
within  the  camera  workspace. 


open  hemisphere;  i.e.  a  diffeomorphic  copy  of  R2.  An  SO (2)  bundle  over  M2  is  diffeomorphic  to 
M2  x  SO  (2).  Therefore 

V'«t+xD2x  SO(2)  xl2«l5x  S1. 

A  perspective  projection  camera  (9),  located  at  the  frame  Fc,  observes  the  rigid  body  feature 
points,  y  =  P2  x  •  •  •  x  P2  (N  times).  In  homogeneous  coordinates,  we  may  write 

Y  =  c(H)  =  CHbP=  AUHATbB 


where  C ,  A1 II  are  given  in  (9)  and  note  that 


Z(H)  :=UHAt  =  n 


n  r2 

0  0 


r  3 
0 


n  r2  T 


Denote  the  columns  Y  =  [  yi,  ■  ■  ■  ,  tjn  ]  where  yi  —  CH  bbi  and  note  that  Y  and  bB  are  related  by 
the  matrix  AZ{H ),  namely 

Y  =  A  Z(H)  bB  (28) 

This  fact  will  provide  a  convenient  way  to  compute  c-1 


4.3.2  Computing  c  1  for  spatial  servoing 

Because  we  will  work  in  a  reconstructed  three  dimensional  space  to  perform  6DOF  visual  servoing, 
we  need  an  inverse  for  the  camera  map.  Computing  the  inverse  requires  that  a  map  be  injective 
onto  its  image  which  is  true  of  c  as  we  show  so  long  as  P  contains  at  least  four  points  in  general 
position  (i.e.,  having  the  property  that  no  three  are  collinear). 

We  must  first  show  that  c  is  well  defined  on  V.  Assume  H  £  V  C  P,  i.e.  that  v(H)  ^  0,  which 
implies  that  Z(H)  is  nonsingular.  Since  Z{H)  is  nonsingular,  AZ(H)  is  a  homography,  and  hence 
c(H)  is  well  defined. 
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Second,  we  must  show  that  each  H  yields  a  unique  Y.  Assume  Hi,H2  G  V  and  Hi  ^  H2.  Using 
a  contrapositive  argument,  assume  that  Y  =  c(H\)  =  c(H2).  Since  bB  has  the  property  there  are 
four  points  such  that  no  three  of  the  four  points  are  collinear  then  we  know  that  Y  has  the  same 
property.10  Hence  c(H\)  =  c(H2)  iff  AZ(Hi)  and  AZ(H2)  the  are  the  same  up  to  a  scale.  This 
implies  that  Z(H\)  and  Z(H2)  are  the  same  up  to  a  scale,  i.e. 

[rn  r  12  Ti]  =  /?  [?’2i  r22  T2] 

where  rl3  denotes  the  jth  column  of  the  ith  rotation  matrix.  But  (3=1  since  1|  =  1  and  (Tjjs  >  0. 
Hence  Z(Hi)  =  Z(H-2 ),  which  is  true  iff  Hi  =  H2.  This  is  a  contradiction,  so  c  is  injective. 

In  practice,  computing  the  inverse  of  c  is  very  straight  forward.  There  must  be  at  least  four 
point  correspondences  such  that  no  three  points  are  collinear  as  discussed  above.  Using  the  possibly 
redundant  and  noisy  data,  Y,  one  simply  computes  the  “best”  (in  a  linear  sense)  homography,  M, 
between  bB  and  Y  [10].  Subsequently,  using  an  approximation  for  the  camera  parameter  matrix, 
compute  G  =  A-1M.  With  noiseless  measurements  and  perfect  calibration 


H  =  Z~\G) 


G\  Cr2  G1XG2  G3 

IIGiH  HGdt  HGilP  II Gi  || 

0  0  0  1 


however,  due  to  calibration  errors  in  A  and  noise  in  Y,  G  will  not  be  exactly  in  the  form  of  Z(H). 
In  our  experiments  we  use  a  simple  heuristic  pseudo-inverse  H  =  Z'(G)  which  is  computed  by 
projecting  the  above  expression,  conceived  as  a  matrix  in  R4x4,  down  to  the  space  of  homogeneous 
transformation  matrices,  so  that  Z(7f)_1G  —  I  is  “small.” 


4.3.3  Model  space  for  spatial  servoing 

To  apply  the  machinery  in  Section  3,  we  restrict  T>  to  be  a  compact  subset  of  V' .  Consider  the  two 
planes  parallel  to  the  image  plane  given  by  z  =  d>min  and  z  =  <5max  such  that  they  each  intersect  Wp 
as  depicted  in  Figure  7.  The  bounded  workspace  is  defined  simply  to  be  the  compact  set 

m,P  :=  {p  €  E3  :  p  C  Wp,  Smin  <p3<  Sm ax}  (29) 

We  will  limit  the  translation  to  this  compact  subset.  Selection  of  i5max,min  is  arbitrary  provided  that 
Ws.p  is  well  defined.  Practically,  one  chooses  them  to  limit  the  range  of  depth  of  the  robot:  too  far 
away  and  it  becomes  more  difficult  to  sense  the  features;  too  close  and  we  risk  hitting  the  camera 
with  the  body.  The  safe  workspace  is  now 

W"  :=  {H  G  SE(3)  :  T  G  WS,P}  (30) 

To  parameterize  the  safe  set,  T> ,  we  use  the  translation  T  and  Euler  angles  ((/>,  ip,  9 ),  in  a  manner 
which  avoids  singularities.  In  particular  the  Euler  angles  (0,  0, 9)  have  the  3-axis  of  the  body  parallel 
to  the  translation,  i.e. 

h{ T,  <p,  ip,  9)  =  hi(T)h2(<P,  ip ,  9)  (31) 

e,xT  Tx(e2xT)  T  t> 

H«2XT||  ||Tx  (e2  xT)  I  PT  L 

0  0  0  lj 

Rx(<t>)Ry(iP)Rz(0)  O' 

0T  1 

10A  projective  transformation,  A  E  PL(2)  is  uniquely  determined  by  the  correspondence  of  four  points  in  general 
position  -  that  is,  no  three  points  may  be  collinear.  Furthermore,  given  four  points  with  the  property  that  no  three 
are  collinear,  their  image  under  A  has  the  same  property  [26]. 


M  T)  := 
h2{<p,  ip,  9)  := 
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Figure  7:  The  workspace  Wc  is  constrained  by  the  FOV  angles  {cti}f= 1  Left:  An  orthographic 
projection  of  the  workspace,  illustrating  the  admissible  cone ,  Wp,  and  the  bounded  workspace,  Ws,P- 
Right:  A  perspective  view  of  the  admissible  cone  and  the  bounded  workspace. 


where  Rx,  Ry  and  Rz  are  the  standard  x  —  y  —  z  Euler  angle  rotation  matrices.  Consider  a  compact 
subset  T'  C  T 

T'  :=  {H  £  SE(3)  :  -  i?max  <  4,  <  tfmax}  (32) 

where  H  is  parameterized  by  (31)  and  timax  <  tt/2.  We  then  have  V  :=  T'  n  W”.  Note  that  V  C  V' 
is  compact. 

We  need  a  mapping  from  T>  to  Z.  We  will  now  build  up  the  inverse  of  that  mapping  /  _1  :  [—1,1]  SxS1 
V.  The  first  piece  is  given  by  f^1  :  [—1,  l]3  — >  Ws,p 

fi(x)  =  b  +  x3v  +  xiei(miX3  +  d3)  +  X2e2(m2x3  +  d2) 

where  we  choose  the  parameters  b,v  £  R3,  mi,m2,di,d2  £  ffi.  so  that  f^1  is  a  diffeomorphism  from 
[— 1,  l]3  to  Ws,P ,  as  follows.  The  point  b  is  on  the  midpoint  of  the  line  segment  down  the  center  of 
W5jP,  as  shown  in  Figure  7,  and  v  is  a  vector  which  points  along  this  centerline.  To  find  b ,  intersect 
the  four  planes  of  Wp  with  a  plane  parallel  to  the  image  plane  at  z  =  (<5min  +  ^max)/2,  and  take 
the  midpoint  of  the  rectangle  that  results.  The  vector  v  is  found  by  subtracting  b  from  the  center 
of  the  rectangle  which  results  from  the  intersection  of  the  plane  2  =  5max  and  Wp.  Note  that  b  ±  v 
corresponds  to  the  center  points  of  the  rectangles  at  the  intersection  of  z  —  <5max,min)  respectively, 
and  Wp.  The  constants  mi,  m2,  d\  and  d2  are  all  selected  so  that  for  each  x3  £  [—1,1],  then 
(xi,x2)  £  [—1,  l]2  parameterizes  a  planar  cross  section  of  WgtP  parallel  to  the  image  plane.  To  select 
these  constants,  observe  the  (x,  z)-orthographic  projection  of  the  workspace  in  Figure  7.  The  value 
of  d\  is  shown  and  is  computed  by  intersecting  the  plane  z  =  (<5min  +  ^max)/2  with  Wp  and  taking 
1/2  of  the  ^-dimension  of  the  resulting  rectangle,  whereas  d2  is  1/2  the  y-dimension.  The  value  of 
mi  is  found  by  observing  that  X\  =  1  and  x2  =  0  corresponds  to  the  top  edge  of  Wp.  Likewise  for 
m2-  We  then  have 

f~1{x,9)  =  hi(fi1(x1,X2,x3))h2(x4‘d  max?  x5d 

max?  0)  (33) 

Note  that  computing  /  is  a  very  straightforward,  albeit  somewhat  tedious,  computation. 
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The  topology  of  V  is  borne  out  by  this  change  of  coordinates:  the  translation,  T,  is  free  to 
move  in  a  diffeomorphic  copy  of  [—1,  l]3,  and  the  first  two  Euler  angles  are  constrained  to  a  copy  of 
[— 1,  l]2,  while  the  last  is  free  to  move  in  the  entire  circle,  S1.  Moreover,  the  parameterization  was 
carefully  designed  to  have  no  singularities.  In  essence,  we  have  shown  that  the  model  space  (35)  for 
n  =  5  and  m  =  1,  namely  Z  =  [—1,  l]5  x  S1  is  diffeomorphic  to  T>  via  the  mapping  /  :  V  ss  Z  (33). 

4.3.4  Navigation  function  for  spatial  visual  servoing 

Let  g  =  f  o  c_1  :  X  — >  Z.  Then 

ip  =  p  o  g  o  c  (34) 

where  <p  is  given  by  (37)  for  n  =  5,  in  =  1,  is  a  navigation  function  for  spatial  visual  servoing. 


5  Empirical  Validation 

In  order  to  test  the  framework  proposed  in  Section  4  we  experimented  with  two  robotic  systems 
that  implement  the  latter  two  imaging  models  introduced  in  the  previous  section.  The  first  system 
is  the  custom  3D  OF  direct  drive  Buehgler  Arm  described  in  Section  4.2  to  test  a  fully  dynamical 
controller  (13)  based  on  the  NF  given  by  (26).  Our  second  set  of  experiments  employ  an  industrial 
6DOF  RTX  robot  from  Universal  Machine  Intelligence  to  test  a  kinematic  controller  (12)  using  the 
spatial  6DOF  NF  (34). 

5.1  Calibration 

Although  the  visual  servoing  methodolgy  confers  robustness  against  parameter  mismatch  in  practice, 
all  such  methods,  including  the  one  presented  in  the  paper,  require  at  least  coarse  calibration  of 
the  robot  and  the  camera.  For  the  RTX,  we  used  the  manufacturer  specified  Denavit-Hartenberg 
parameters  and  a  linear  method  ([10],  Section  3)  that  requires  a  set  of  point  correspondences  between 
points  in  space  and  their  respective  image  to  simultaneously  estimate  both  intrinsic  and  extrinsic 
camera  parameters.  To  obtain  the  correspondences,  a  feature  affixed  to  the  robot  end  effector  was 
moved  to  a  grid  of  positions  in  view  of  the  vision  system  which  extracted  an  image  plane  location 
for  each  feature  position  in  space.  For  the  Buehgler  setup  we  measured  the  paddle  length  g  and 
the  shoulder  offset  5  by  hand  and  proceeded  as  for  the  RTX  to  obtain  a  rough  estimate  of  the 
camera  parameters.  A  gradient  algorithm  based  on  a  simple  pixel  disparity  cost  function  refined  our 
parameter  estimates  for  the  11  camera  and  two  robot  parameters. 

5.2  The  Buehgler  Arm 

The  Buehgler  Arm  is  controlled  by  a  network  of  two  Pentium  II  class  PCs  running  LynxOS  (http:  / /www- 
lynx.com/),  a  commercial  real-time  operating  system.  The  two  nodes  communicate  on  a  private 
ethernet  using  the  User  Datagram  Protocol  (UDP).  The  first  captures  8bit  528x512  pixel  images 
at  100Hz  using  an  Epix  (http://www.epixinc.com/)  Pixci  D  frame  grabber  connected  to  a  DALSA 
(http://www.dalsa.com/)  CAD6  high-speed  digital  camera.  The  images  are  processed  to  extract 
the  location  (position  and  orientation)  of  the  feature  point  at  the  end  of  the  paddle.  The  second 
node  implements  servo  control  using  the  Trellis  (http://www.trellissoftware.com/)  motion  controller 
with  a  servo  rate  of  1kHz,  based  on  the  dynamical  controller  in  (13),  wherein  the  damping  term  is 
computed  from  encoder  data  using  finite  differencing  to  estimate  the  joint  velocities. 

Two  sets  of  experiments  were  conducted  using  the  appropriate  NF  (26),  implemented  with  two 
different  gain  settings  (i.e.,  assignments  for  the  parameter  array  K  in  (36)  and  Kd  in  (13))  chosen 


21 


to  contrast  performance  resulting  from  a  locally  well  tuned  critically  damped  closed  loop  using 
relatively  high  gains,  as  against  a  “detuned”  low  gain  and  underdamped  circumstance.  Each  trial 
consisted  of  driving  the  feature  position  and  orientation  to  a  goal  (z* ,  £*)  from  some  initial  condition 
in  joint  space  {qo,qo)-  For  the  “tuned”  gain  experiments,  a  set  of  8  goal  locations  with  and  40  initial 
conditions  were  chosen  in  an  effort  to  “defeat”  the  controller.  In  particular,  initial  configurations 
were  chosen  near  the  edge  of  the  FOV,  with  initial  velocity  vectors  chosen  so  as  to  drive  the  robot 
out  of  the  FOV.  The  initial  conditions  were  prepared  with  a  simple  joint-space  trajectory  planner 
and  joint-space  PD  controller  that  drove  the  robot  to  the  starting  state  at  which  time  the  control 
switched  to  the  NF  based  controller.  In  other  words,  we  forced  the  robot  to  literally  “fling”  itself 
toward  an  obstacle  before  turning  on  our  visual  servoing  controller.  Both  the  goal  positions  and 
initial  conditions  where  chosen  to  span  the  visible  robot  workspace.  The  control  law  gains  were 
hand-tuned  to  provide  nearly  critically  damped  performance,  and  settling  times  on  the  order  of  a 
second.11  For  the  “detuned”  gain  experiments,  a  smaller  set  of  more  aggressive  initial  conditions  and 
goal  locations  was  used,  and  the  damping  gain  was  reduced  to  provide  “underdamped”  performance. 
There  were  4  goals  and  8  initial  conditions.  Figure  8  shows  the  the  error  coordinates  of  a  typical 
run  for  both  “tuned”  and  “detuned”  gains. 


Table  1:  Summary  of  results  for  Buehgler  arm. 


Normalized  Path  Length 

Goal 

Succ. 

Jnt.  Space 

Pix.  Space 

Gains 

# 

Rate 

Mean  (dev) 

Mean  (dev) 

Tuned? 

1 

40/40 

1.75  (0.12) 

1.63  (0.19) 

Yes 

2 

40/40 

2.38  (0.34) 

1.85  (0.35) 

Yes 

3 

36/40 

2.02  (0.87) 

1.65  (0.22) 

Yes 

4 

40/40 

2.07  (0.36) 

1.94  (0.50) 

Yes 

5 

40/40 

1.79  (0.36) 

1.64  (0.23) 

Yes 

6 

37/40 

2.15  (0.41) 

2.00  (0.62) 

Yes 

7 

36/40 

1.96  (0.85) 

1.63  (0.19) 

Yes 

8 

40/40 

2.05  (0.65) 

2.02  (0.75) 

Yes 

1 

6/8 

3.01  (1.93) 

3.59  (1.94) 

No 

2 

6/8 

1.64  (0.40) 

2.41  (0.53) 

No 

3 

5/8 

2.54  (1.86) 

3.91  (2.40) 

No 

4 

7/8 

2.97  (0.46) 

3.30  (0.65) 

No 

To  quantify  the  results  we  examine  similar  measures  as  for  the  RTX.  Table  1  summarizes  the 
results  of  the  our  experiments.  With  well  tuned  gains  the  controller  consistently  drove  the  feature 
to  the  goal  location  with  a  rate  of  success  of  97%.  Of  the  11  errors  one  was  due  to  exceeding  the 
robot’s  maximum  velocity,  one  to  a  software  driver  error,  and  one  to  a  feature  leaving  the  FOV  of  the 
camera  during  initialization.  The  remaining  8  failures  were  caused  by  not  allowing  enough  time  for 
convergence  as  each  experiment  lasted  6  seconds.  These  errors  generally  arose  when  the  robot  was 
close  to  a  saddle  of  the  NF  so  the  controller  was  slow  to  overcome  the  robot’s  unmodeled  friction. 
However,  with  “detuned”  gains  and  high  initial  velocity  the  feature  left  the  FOV  25%  of  the  time. 
These  failures  are  due  to  the  fact  that  the  initial  energy  of  the  robot  arm  caused  the  arm  to  escape  the 

11  Of  course,  the  allusion  to  linear  notions  of  damping  is  merely  an  intuitive  designer’s  convenience.  We  chose 
gains  to  ensure  the  local  linearized  system  was  critically  damped  at  the  eight  equilibrium  states,  and  then  tuned  up 
the  “boundary”  gains  to  force  reasonably  snappy  descent  into  the  domain  wherein  the  linearized  approximation  was 
dominant. 
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potential  well  -  by  using  a  lower  “detuned”  gain  on  the  potential  energy  feedback  term,  the  potential 
barrier  is  reduced.  (It  would  not  be  difficult  to  compute  the  invariant  domain,  as  in  [17]  -  these 
experiments  give  some  sense  of  the  relatively  graceful  performance  degradation  consequent  upon 
imperfectly  tuned  gains.)  Figure  8  shows  image-based  error  plots  and  the  image-plane  trajectory 
for  two  typical  runs. 

To  determine  the  accuracy  with  which  the  feature  reached  the  goal,  the  mean  pixel  error  is  given 

by 

256 y/ (Zfinal  -  Z*)T(zfinal  -  Z*)  +  (^)2(Cfinal  -  C*)2 

so  the  model  coordinates  are  scaled  to  be  commensurate  with  pixel  error.  As  can  by  seen  in  Figure  8 
(bottom  right)  the  mean  errors  are  in  the  neighborhood  of  1  to  2  pixels  over  each  of  the  eight  goal 
positions. 

Table  1  shows  the  path  length  taken  by  the  robot  compared  to  the  straight  line  path  length 
in  both  joint  and  pixel  space.  The  results  indicate  that  in  joint  space  our  navigation  function 
produced  results  within  a  factor  of  2.5  of  the  straight  line  distance.  We  attribute  a  large  portion 
of  this  additional  path  length  to  the  fact  that  the  large  initial  velocities  caused  “curviness”  to  the 
trajectories. 

Finally,  we  designed  our  controller  to  have  a  very  rapid  and  dextrous  response.  The  Buehgler 
arm  has  a  mass  in  excess  of  lOOKg  making  precise,  quick  and  efficient  movement  quite  challenging. 
Figure  8  (top  right)  shows  our  navigation  based  controller  produced  a  one  second  or  less  five  percent 
settle  time  for  seven  of  the  eight  primary  goal  positions. 

5.3  The  RTX  Arm 

The  RTX  is  commanded  through  the  serial  port  of  a  single  Pentium  PC  running  a  Linux  2.0  kernel 
(hard  real-time  is  not  required,  hence  the  standard  Linux  kernel  was  adequate).  The  PC  is  equipped 
with  a  Data  Translations12  DT3155  frame  grabber  connected  to  a  standard  30Hz  NTSC  video 
camera.  Using  MATLAB’s  C-language  API,  we  created  a  simple  interface  to  the  camera  and  robot 
accessible  from  within  the  MATLAB  programming  language. 

The  theory  presented  in  Section  4.3  presumes  the  configuration  space  to  be  Q  =  SE(3).  However, 
Q  is  parameterized  (locally)  by  the  robot  joint  angles  q  £  R6  through  the  forward  kinematics,  namely 
h  :  R6  — y  Q.  Of  course,  inevitably,  all  such  kinematic  parameterizations  introduce  singularities  that 
may,  in  turn,  inject  spurious  critical  points  to  the  gradient  fields,  necessarily  actuated  in  the  robot’s 
joint  space  rather  than  in  the  task  space,  as  our  theory  presumes.  Similarly,  since  our  formal 
theory  “knows”  only  about  visibility  bounds,  the  robot’s  unmodeled  joint  space  angles  limits  are 
not  in  principle  protected  against.13  However,  the  weight  of  experimental  evidence  we  present  below 
suggests  that  these  discrepancies  between  presumed  model  and  physical  reality  do  not  seriously 
imperil  the  practicability  of  this  scheme.  Regarding  the  first  discrepancy,  the  absence  of  stalled 
initial  conditions  suggests  that  any  critical  points  so  introduced  were  not  attractors.  Regarding  the 
second,  we  found  that  choosing  initial  and  goal  locations  away  from  the  joint  space  boundaries  was 
sufficient  to  avoid  running  into  the  end-stops. 

The  RTX  controller  employs  first  order  gradient  descent  on  the  navigation  function  in  (34). 
Because  the  RTX  arm  accepts  only  position  commands,  given  goal  and  current  images  with  feature 
points  extracted,  the  gradient  update  was  implemented  iteratively,  as  follows: 

12http : //www. datax. com/ 

12  Addressing  the  further  practical  realities  of  kinematic  singularities  and  robot  joint  space  limitations  falls  outside 
the  scope  of  the  present  paper  (and,  indeed,  is  not  even  addressed  at  all  in  the  traditional  visual  servoing  literature). 
In  principle,  the  NF  framework  would  be  relevant  to  these  problems  as  well:  joint  space  limits  are  analogous  to  the 
FOV  obstacles,  while  the  kinematic  singularities  are  akin  to  self-occlusion. 
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uk  <=  -Dj(fi=  -DT(f  O  h)(qk)  Djipz*(zk), 
qk+i  <=  qk  +  otUk  (where  a  is  the  step  size). 

Note  that  the  Jacobian  matrix  Dq(f  o  h)  can  be  decomposed  into  the  product  of  Df ,  which  maps 
from  the  body  screw  axis  to  the  model  space,  and  the  manipulator  Jacobian  Dh ,  which  maps  from 
the  robot  joint  space  to  the  body  screw  axis.  Indeed,  such  a  decomposition  implies  the  extrinsic 
parameters  are  not  needed  and  hence  one  may  move  the  camera  without  recalibrating. 

To  explore  our  algorithm,  we  conducted  a  set  of  experiments  in  which  58  initial  conditions  were 
tested  for  four  goal  locations,  giving  232  candidate  experiments.  Both  the  initial  conditions  and 
goal  locations  were  chosen  randomly  from  a  grid  of  4096  points  in  model  space  (configurations  near 
kinematic  singularities  and  not  within  the  robot  workspace  were  removed).  We  chose  many  initial  and 
goal  configurations  near  the  boundaries  of  the  workspace,  hence  of  the  232  candidate  experiments, 
an  additional  29  experiments  where  removed  as  the  features  began  outside  of  the  robot’s  workspace 
due  to  “sloppiness”  in  the  joints  of  the  robot  i.e.  there  is  a  lot  of  play  in  the  joints  so  a  specific  initial 
condition  in  the  joint  space  may  not  be  the  same  place  in  SE(3)  in  subsequent  trials,  and  therefore 
some  initial  conditions  were  out  of  the  visible  workspace  at  the  start.  Initially,  the  robot  was  moved 
to  each  goal  location  to  capture  an  image  of  the  robot,  respecting  which  the  vision  system  stored 
the  desired  location  of  feature  points,  y*.  Figure  9  shows  the  pixel  errors  feature  trajectories  of  two 
typical  runs.  As  shown,  we  used  four  coplanar  feature  points  for  the  camera  map,  c  :  Q  — >  y. 


Table  2:  Summary  of  results  for  RTX  arm 


Normalized  Path  Length 

Goal 

Success 

Jnt.  Space 

Pix.  Space 

# 

Rate 

Mean  (dev) 

Mean  (dev) 

1 

49/51 

1.35  (0.39) 

1.33  (0.49) 

2 

47/47 

1.36  (0.29) 

1.27  (0.37) 

3 

55/57 

1.32  (0.23) 

1.27  (0.32) 

4 

47/48 

1.42  (0.36) 

1.32  (0.25) 

To  ascertain  the  performance  of  our  controller,  we  employed  several  metrics  described  below: 
success  and  failures,  efficiency  of  motion,  mean  pixel  error  and  setting  time. 

Table  2  shows  the  success  rate  of  the  various  goal  positions.  Of  203  trial  runs,  5  were  found  to 
have  failed.  All  5  failures  are  due  to  the  robot  not  converging  in  our  limit  of  30  iterations  (though 
after  inspecting  the  data  by  hand,  it  appears  that  the  robot  would  have  converged  if  given  a  few 
more  iterations). 

We  also  measured  the  “efficiency”  of  motion  relative  to  the  straight  line  distance  in  both  image 
and  cartesian  space.  The  metric  used  for  measuring  distances  in  the  configuration  space  was  the 
sum  of  the  angular  displacement,  scaled  by  body  length,  and  the  translational  displacements.  For 
all  of  the  runs,  both  image  and  cartesian  measures  indicated  that  the  path  length  was  around  1.4 
times  that  of  straight  line  distance.  See  Table  2. 

Using  the  root  mean  squared  average  pixel  error  measurement  given  by  yj j  Ylt—iiVi  —  Vi)2  we 
found  an  average  final  pixel  error  on  the  order  of  1-2  pixels  upon  convergence.  Figure  9  (upper  right) 
shows  the  mean  pixel  error  and  standard  deviation  for  each  of  the  four  unique  goal  positions. 

The  average  five  percent  setting  time,  shown  in  Figure  9  (lower  right),  was  approximately  10-14 
iterations  for  each  of  the  four  goal  locations,  averaged  over  all  successful  runs. 
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6  Conclusions 


This  paper  addresses  the  problem  of  driving  image  plane  features  to  some  goal  constellation  while 
guaranteeing  their  visibility  at  all  times  along  the  way.  We  cast  the  problem  as  an  instance  of 
generalized  dynamical  obstacle  avoidance,  thereby  affording  the  use  of  navigation  functions  in  a 
nonlinear  PD-style  feedback  controller. 

We  demonstrate  the  applicability  of  this  framework  in  three  different  examples,  including  exten¬ 
sive  empirical  results  in  the  two  more  complex  cases.  The  two  experimental  systems  confirmed  the 
practicability  of  theoretical  framework:  the  custom  3DOF  Buehgler  Arm  and  the  6DOF  commercial 
RTX  arm.  For  the  Buehgler,  our  experiments  suggest  that  the  navigation  function  based  controller 
indeed  drives  the  feature  to  within  a  few  pixels  of  the  goal  in  all  cases  where  the  initial  energy  did 
not  overwhelm  the  potential  energy  well.  The  kinematic  experiments  with  the  RTX  validated  our 
6DOF  task-space  servo  architecture.  In  both  cases  our  results  show  systems  with  large  basins  of 
attraction  that  both  avoid  self  occlusion  and  respect  FOV  constraints. 

The  efficacy  of  this  approach  depends  upon  the  explicit  construction  of  a  model  space  together 
with  a  coordinate  transformation  back  to  the  image  plane  respecting  which  the  visibility  obstacles 
such  as  self  occlusions  and  the  FOV  can  be  represented  as  the  boundary  of  a  compact  manifold. 
Finding  a  good  general  image-based  coordinate  system  seems  challenging,  hence  we  have  had  to 
resort  to  similar  but  distinct  constructions  in  the  examples  discussed.  Ideally,  one  creates  an  image 
based  coordinate  system  using  direct  feature  information  as  in  the  planar  setting  of  Section  4.1  and 
the  dynamical  setting  of  Section  4.2.  In  many  cases,  it  may  be  helpful  to  “enlarge”  the  obstacles  to 
simplify  the  topology  as  in  the  spatial  setting  of  Section  4.3.  Conceivably,  the  homography  matrix 
(25)  computed  directly  from  image-plane  correspondences,  could  provide  the  basis  for  a  more  general 
visual  coordinate  system,  and  this  remains  an  interesting  avenue  for  future  investigation. 
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A  A  navigation  function  for  [— 1,  l]n  x  T? 


Let 

Z=  [-l,l]"xTm 

O 

for  some  to,  n  €  N  and  let  (z*,  (*)  GZ  denote  a  goal.  Consider  the  function  /  :  (—1, 1) 


(35) 


n  _ .  id m 


m  = 


L  (1-*?)’ 


(l-s£)5 
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Let  K  £  K"xn  be  a  positive  definite  symmetric  matrix  and  Ki  >  0,  i  =  1, . . . ,  m.  Define14 

771 

<p(z,o  '■=  y(z)TK/(z)+'52Ki(1~cos(&  _o)-  (36) 

i- 1 

Proposition  4.  The  objective  function 


<P  := 


V 

1  +  p 


(37) 


is  a  navigation  function  on  Z,  where  p  is  given  in  (36). 

Proof.  According  to  Definition  1,  ip  must  be  a  smooth  Morse  function  which  evaluates  uniformly  to 
unity  on  the  boundary  of  Z ,  and  has  as  the  unique  minimum. 

The  boundary  of  Z  is  given  by 


dZ  =  {(z,  C)  £  Z  :  Zi  =  ±1,  i  £  {1, . . . ,  ?r}}  . 

Clearly,  < p  evaluates  to  1  on  the  boundary,  i.e.  as  Zi  — >  ±1  then  ip  —*  1.  Furthermore,  V(z,£)  £  Z, 
Tp  >=  0.  Moreover,  Tp  =  0  iff  (,z,£)  =  (,z*,£*  +  5Zier  27rei)  =  (2*,C*)>  *-e.  the  (z*,£*)  is  the  global 
minimum. 

To  study  the  critical  points  of  ip,  we  need  only  study  those  of  Tp,  because  the  function  <7  :  [0,  00)  — > 
[0, 1)  given  by  a(x)  =  x/(l  +  x)  has  derivative  cr'(x)  =  1/(1  +  x)2,  which  does  not  introduce  any 
spurious  critical  points.  The  critical  points  of  <p  are  found  by  solving 


0  =  Dp  = 


[fTKDf,  m  sin(Ci 


Cl))''"  !  Km  sill((m  Cm)  (38) 


noting  that 


Df  =  diag  {/'}  ,  where  f'(z)  := 

(1  -  z2)3/2 

i  =  1, . . . ,  n. 


Since  Df  is  nonsingular  on  (—1, 1)",  Dp  =  0  iff  /  =  0  and  sin(Ci  —  C* )  =  0,  i  =  1, . . . ,  n  which  is  true 
iff  (2,  C)  =  (z*,(*  +  X)ier  7T e*),  T  £  powerset{l, . . . ,  m}.  There  are  2m  index  sets  which  enumerate 
all  possible  critical  points.  One  readily  verifies  that  the  Hessian  is  nonsingular  at  every  critical  point 
and  (z*,C*)  is  the  only  minimum.  Hence  p  is  a  Morse  function  which  evaluates  uniformly  to  unity 
on  the  the  boundary,  has  2m  —  1  saddles  and  the  goal  is  the  unique  minimum.  □ 


B  Buehgler  map  diffeomorphism 

As  described  in  Section  4.2,  the  physical  Buehgler  arm  is  a  three  degree  of  freedom  (all  revolute) 
kinematic  chain  whose  free  joint  space,  V  C  T3,  is  the  cross  product  of  the  (bounded)  angles  allowed 
its  “base”  (first  two)  joints,  1Z  together  with  with  the  (unbounded)  circular  motion  allowed  its  last 

14Since  use  local  coordinates  to  define  our  cost  function  (36),  we  require  Tp(z,C,)  =  Zpiz,  Q  +  \-  Uve, )  for  all 

possible  index  sets  T  £  powersetjl, . . .  ,m},  where  e,  is  the  unit  vector  whose  ith  element  is  1.  This  ensures  that  Tp  is 
continuous  on  Z. 
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joint  —  that  is,  V  :=  1Z  x  S1.  In  a  similar  vein  as  the  image  space,  we  identify  the  joint  space  with 
the  unit  tangent  bundle,  V  «  TilZ  C  T1Z. 

If  there  were  no  “shoulder”  offset  ( 6  =  0)  then  the  workspace  image  of  the  kinematic  base  map 
would  describe  a  metrically  spherical  surface  in  the  codomain  (three  space),  i.e.  ro(lZ)  =  Mo  C  S2, 
where  g  is  the  length  of  the  arm  [30],  and  the  final  joint  axis  would  permit  motion  of  the  pointer 
on  the  tip  of  the  arm  along  a  perfect  copy  of  the  unit  circle  tangent  to  the  surface  at  each  “base 
point”,  s  €  Mo-  For  this  simplified  version  of  the  kinematics,  then,  we  are  able  to  model  the  range 
space  of  the  full  kinematic  map  as  the  unit  tangent  bundle  over  the  base  image,  XjAIo  C  TjE3.  It 
follows  that  the  full  kinematic  map,  ho ,  may  be  modeled  as  the  restriction  of  the  tangent  map,  Tro, 
to  the  unit  tangent  bundle  over  the  base  joint  space,  ho  =  Tro  ItiK-  Specifically,  ho  is  a  fiber  map 
[14],  taking  its  domain  in  the  unit  tangent  bundle  over  the  base  joints,  TilZ  and  its  image  in  the 
unit  tangent  bundle  over  the  surface,  XjAIo- 

As  described  in  the  body  of  the  text,  we  have  so  arranged  the  camera  relative  to  the  Buehgler 
that  7r  o  rs  is  an  analytic  diffeomorphism  between  the  robot  base  joints,  1Z ,  and  the  camera  image 
plane,  J  C  E2.  In  general,  the  tangent  lift  of  a  diffeomorphism  between  two  spaces  is  itself  a 
diffeomorphism  (of  one  lower  grade  of  differentibility)  between  their  tangent  spaces,  and  it  follows 
that  T(n  o  rs)  is  an  analytic  diffeomorphism  between  T1Z  and  TE2. 

Now,  for  S  =  0  we  have 


7i  1Z  — ^  T\Mo 


T-k\tiMo 


TiE2, 


hence,  we  may  view  the  camera  map  as  the  composition  c  =  Tn  \txM0  °ho ,  which,  in  turn  may  be 
rewritten, 


c  =  Tit  ItiXo  °^o 

=  Ttt  ItiXo  °Tr0  |ti-r 
=  (Ttt  o  Tro) 

=  T(tt  o  r0)  \t!TI, 

as  the  restriction  of  the  diffeomorphism  T(tt  o  ro)  to  the  unit  tangent  bundles  over  the  domain  and 
range  of  the  base  map  composition,  (tt  o  ro).  In  general,  the  restriction  of  a  diffeomorphism  to 
a  smooth  submanifold  in  the  domain  must  remain  a  diffeomorphism  onto  its  image,  hence,  when 
d  =  0,  we  have  shown  that  c  is  a  diffeomorphism  between  the  configuration  space,  T{1Z  and  the 
image  space,  Tj  J. 

Since  the  property  of  being  a  diffeomorphism  is  an  open  property,  it  follows  that  c  remains  a 
diffeomorphism  for  small  enough  perturbations,  5  ^  0. 
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Figure  8:  Left:  The  pixel  error  and  corresponding  image  plane  feature  trajectory  for  a  typical  high 
gain  trial  on  the  Buehgler  robot.  Middle:  A  typical  low  gain  trial,  with  a  different  initial  and  goal 
locations.  A  two-dimensional  cross  section  (with  9  =  9*)  of  the  levels  sets  of  the  NF  is  superimposed 
on  the  image  plane  trajectory.  Right:  Buehgler  convergence  results.  Top:  Five  percent  settling 
time  for  each  of  the  eight  high-gain  goal  positions.  Bottom:  The  mean  pixel  error  for  each  of  the 
eight  goal  positions. 


30 


Pixel  Pixel  Goal  Position 


Figure  9:  Left:  The  pixel  error  and  corresponding  image  plane  feature  trajectory  for  a  typical  trial. 
Middle:  Another  typical  trial,  with  a  different  initial  condition  and  goal  location.  Right:  RTX 
convergence  results.  Top:  The  mean  pixel  error  for  each  of  the  four  goal  positions.  Bottom:  Five 
percent  settling  time  for  each  of  the  four  goal  positions. 
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